Causal inferences and real-world evidence: A comparative effectiveness evaluation of abiraterone acetate against enzalutamide

Regulatory authorities are recognizing the need for real-world evidence (RWE) as a complement to randomized controlled trials in the approval of drugs. However, RWE needs to be fit for regulatory purposes. There is an ongoing discussion regarding whether pre-publication of a protocol on appropriate repositories, e.g. ClinicalTrials.gov, would increase the quality of RWE or not. This paper illustrates that an observational study based on a pre-published protocol can entail the same level of detail as a protocol for a randomized experiment. The strategy is exemplified by designing a comparative effectiveness evaluation of abiraterone acetate against enzalutamide in clinical practice. These two cancer drugs are prescribed to patients with advanced prostate cancer. Two complementary designs, including pre-analysis plans, were published before data on outcomes and proxy-outcomes were obtained. The underlying assumptions are assessed and both analyses show an increased mortality risk from being prescribed abiraterone acetate compared to enzalutamide.


Introduction
There is a growing interest in using real-world evidence (RWE) for regulatory purposes.The belief is that real-world data (RWD), or observational data, can make drug development more efficient and speed up patient access to new drugs.The European Medicines Agency (EMA) was therefore paving the way for RWE.EMA is providing incentives to use RWE for regulatory approval by, for example, the introduction of the Adaptive Pathways pilots in March 2014 [1].The Adaptive pathways offered an iterative process for regulatory approval in which data from randomized control trials (RCT) are supplemented with RWD.Additionally, EMA is revising pharmaceutical legislation to acknowledge the possibilities arising from RWD analyses to support the development, authorization, and use of medicines [2] (cf.Burns et al. [3]).
However, there is concern that analyses based on observational data suffer from substantial biases [4].Consequently, there are numerous initiatives for methodological improvements to, among others, control biases.One such initiative is IMI GetReal: a joint effort between EMA, the industry, and the EU, that offers an exchange of insight and know-how in using RWD in drug development [5].Similar initiatives have been set up in the US and Asia [6].The results of these are more authorizations of drugs and extensions of indications based on RWE.For new products and extensions of indications submitted to the Agency in 2018 and 2019, Flynn et al. [7] find that 40% of the initial marketing authorization applications and 18% of applications for products currently on the market contained RWE.
As RWD is increasingly accepted as evidence in the regulatory process, there is an ongoing discussion about whether or not the requirement for generating this evidence should be the same as for RCT.One such requirement is that of pre-published protocols in an appropriate repository, such as ClinicalTrials.gov.However, there has yet to be consensus on the content requirement in such a pre-published protocol [3].As a potential input to the discussion, this paper illustrates that an observational study based on a pre-published protocol can entail the same level of detail as a protocol for an RCT.
To this end, we present the first set of results from a methodological project funded by the Swedish Dental and Pharmaceutical Benefits Agency (TLV).TLV is responsible for determining which pharmaceutical products, care-related medical devices, and dental care procedures should be subsidized by the Swedish state.The objective of the project was to serve as a template for how to use the Swedish administrative population registries, in combination with quality registers, to conduct comparative effectiveness evaluations of interventions.The first part of the project consists of a comparative effectiveness evaluation of abiraterone acetate (AA) against enzalutamide (ENZ) in clinical practice.These two cancer drugs are given to patients who have advanced prostate cancer.The second part consists of a comparative effectiveness evaluation of these two drugs against standard of care.
In this paper, the results from the first part of the project are discussed.The designs were described in two pre-analysis plans [8,9], both published before access to outcome data.
The main advantage with detailed protocol requirement is that it restricts the potential for p-hacking, forking etc., which is a problem with empirical research, see e.g.Amrhein et al. [10]; Wasserstein et al. [11].Thus, one can argue that analyses based on detailed pre-published protocols increase the analyses' objectivity.An objection to publishing a detailed protocol is that it restricts the possibility of the researchers incorporating new information only available after having access to all data.With access to data, the researcher may observe irregularities, enabling them to find a suitable model that will increase both the validity and precision of the analysis.As we are prone to see patterns where there are none (i.e.apophenia), we believe this strategy of finding suitable models is risky and prone to providing invalid inferences.Furthermore, the fact that an analysis is based on a detailed protocol does not prohibit additional exploratory analyses using better-suited models to incorporate new information.
A high-quality study should be based on a carefully crafted design.This requires (i) an understanding of the assignment to treatment (i.e. an understanding of prescription practice in the applications), (ii) clear statements of the assumptions made, and (iii) details of how these assumptions should be assessed.A requirement of a detailed protocol, where these three steps are discussed, forces researchers to "think beforehand", which may increase quality by forcing the researcher to carefully think through the design, while at the same time, as a consequence of "tying oneself to the mast", provides an objective analysis.The readers themselves need to judge whether the illustration in this paper provides support for this claim.

Prostate cancer and novel hormone treatment
Prostate cancer (PC) is reported to be the most commonly diagnosed form of cancer in Sweden.In 2016, 10,473 patients were diagnosed, creating a total pool of 107,752 PC patients.It is approved by the ethical review board.Therefore, the data cannot be made publicly available.The ethical permission to replicate our research and also the data for the replication need, however, be requested by the researchers themselves; the permission at registrator@etikprovningen.se, and the data at registerservice@socialstyrelsen.se, uppdrag@scb.se,or npcr@npcr.sefor the NBHW, Statistics Sweden, and NPCR respectively.The practical arrangements for accessing the data will, to some extent, depend on the location of the researcher.also the diagnosis with the largest number deaths among all main diagnoses of men in Sweden, and almost all deaths arise when patients have progressed to the advanced metastatic castrateresistant prostate cancer (mCRPC) stage.Approximately 10-20 percent of patients with PC develop mCRPC within five years of follow-up after initial therapy [12][13][14].
Various treatment alternatives are available for patients with mCRPC.In the last two decades, chemotherapy and novel hormone treatment (NHT) medications (of which the first two in this group of treatments are AA and ENZ) have revolutionized treatment of mCRPC patients [15][16][17][18][19][20].Patients with mCRPC have a poor prognosis, and their quality of life deteriorates as the disease progresses.When used for metastatic hormone-resistent prostate cancer, both AA and ENZ have thus shown to reduce mortality and improve overall survival [21].
Several indirect analyses have compared overall survival in patients treated with ENZ or AA, see, e.g.Chopra et al. [22], Fang et al. [23] or McCool et al. [24].To the best of our knowledge, few studies have evaluated the comparative effectiveness of AA and ENZ on overall survival in a real-world setting.Recently, however, Tagawa et al. [25] and Schoen et al. [26] found an improved survival of ENZ over AA, using the Veterans Health Administration (VHA) database in the US.
This study evaluates the use of ENZ and AA in clinical practice from June 2015, corresponding to the date when these drugs were first reimbursed for mCRPC patients in Sweden.Data are collected from population registers administrated by the National Board of Health and Welfare (NBHW), Statistics Sweden (SCB), and the National Prostate Cancer Register (NPCR).The population is restricted to all men in the NBHW register with a prostate cancer diagnosis before 2017, as only these patients were expected to progress to mCRPC during the period for which we have outcome data.Before June 2015, almost no patients were prescribed any of the drugs as these were not yet reimbursed.The Dental and Pharmaceutical Benefit Agency (TLV) approved reimbursement for mCRPC patients who had failed androgen deprivation therapy (ADT) and were not yet suited for docetaxel (pre-chemotherapy), and for patients who had failed docetaxel (post-chemotherapy).Once reimbursed, the number of prescriptions increased rapidly.On 15 June 15 2018, AA was additionally reimbursed as an addon to ADT in patients with high-risk metastatic hormone-sensitive prostate cancer (mHSPC).Therefore, we restricted the population to patients prescribed AA or ENZ from 1 June 2015, to 15 June 2018.
We estimate the effect on one primary outcome and two secondary outcomes to capture different aspects of morbidity.The primary outcome is all-cause mortality; the two secondary outcomes are skeleton-related events (SRE) and severe pain.The designs of the two complementary analyses are described in two pre-analysis plans [8,9].In order for the designs to be valid, there needs to be some randomness in the prescription given observed covariates.Section 2 provides arguments for why the doctors' prescription is random given the covariates used to control for the patients' health.
This study contributes to the growing literature of simultaneously using matching samples and instrumental variables analysis [27].
Design 1 Johansson et al., [8] presents a matching design and protocol for a regressionadjusted matching estimator, where balance on observed covariates is obtained.Before publication, the design was discussed with two pharmaceutical companies, who had no objections and no requirement for additional covariates to be balanced.
Design 2 Johansson et al., [9] is based on the same study population and makes use of differences in prescription practices across 21 county councils in an Instrumental Variables (IV) analysis.
The quality of healthcare, however, affects health, so quality differences across county councils can be related to the prescription of the two drugs.Thus, it is only relevant to believe the instruments to be 'randomly assigned', given a set of controls for these quality differences.Johansson et al. [9] show the validity of the IV, present a sensitivity analysis of the exclusion restriction of the instruments, and pre-specify the IV models used for the analysis presented in this paper.
Inference from observational studies can potentially suffer from many types of biases, one of which concerns the potential objectivity of researchers.In this paper, access to outcome data was obtained after the publication of both pre-analysis plans (see S1 Text for documentation).The implication is that results from the analyses are as objective as those from an RCT.Furthermore, when adding mortality data, data from NPCR were also added.These data contain more detailed information on the patients' prostate cancer health, allowing us to assess the identifying assumptions in the designs and analyses.
The results from the two analyses show an increased mortality risk from prescribing AA compared to ENZ, and support the findings in Tagawa et al. [25] and Schoen et al. [26].The matched sampling design analysis also suggests an increased risk of skeleton-related events.Further, the study shows the strength of using a matched sample design and IV strategy simultaneously.It also confirms previous results of lack of precision using the IV analysis [27].
The remainder of the paper continues as follows.Section 2 describes the data.Section 3 describes the matched sample design and the regression adjustment analysis, while section 4 describes the IV analysis.The results from both analyses are presented in section 5.In addition to the results on the mortality (section 5.1) and morbidity outcomes (section 5.2), this section assesses the potential problem of confounders (section 5.3) and a short discussion of the results from exploratory analyses (section 5.4).The paper concludes with a discussion in section 6.

Data
The specific data we use and the way we process the data have obtained ethical approval from the Ethical Committee in Uppsala (ref.Dnr2017/482).Data are collected only from population registers administrated by the NBHW and SCB and quality registry administered by NPCR, which means that no informed consent was needed according to Swedish Act (2003:460) on Ethical Review of Research Involving Humans.
The data are then linked using unique serial numbers created by the SCB.From the NBHW, we link data from the inpatient care register and the pharmaceutical register.All inpatient and outpatient care visits in Sweden and all prescribed drugs are listed in these registers.The inpatient care register contains, among other things, information on all diagnoses (using the ICD classifications), date of admission, and discharge.The pharmaceutical register includes the date of prescribing and dispensing the drugs and the ATC class of the drug.
From SCB, we link data from a census conducted every fifth year over the period 1960-1990, labour statistics based on administrative sources (RAMS) for the period 1985-2009, and data from LISA covering the period 1990-2015.LISA is an extensive database that links a large set of administrative registers using the Swedish person id in the linkage.The linked data contain each individual's disposable income, labour income, social insurance payments, capital income, labour market status, year of birth, education, marital status, etc., from 1960 to 2015.
The population under study is defined using the cancer register.We first identify all men with a prostate cancer diagnosis before 2017 and the year of their diagnosis.We identify 243,535 unique patients with prostate cancer (ICD-10 code C61.9 or earlier codes ICD-7 177 and ICD-9 185.9).The population is then restricted to all men collecting a prescription of AA or ENZ from 1 June 2015 to 15 June 2018.The reasons for the time restriction are: (i) that almost no one was treated with these drugs before the reimbursement of AA and ENZ in June and July 2015, respectively, and (ii) that AA was additionally reimbursed in combination with ADT in patients with high-risk castration hormone sensitive prostate cancer (mHSPC) and unsuited for docetaxel on 15 June 2018.
485 patients, or around 10%, of the patients were prescribed both AA and ENZ over the years.We allocate these patients to the two samples, AA and ENZ-takers, based on their first prescription of one of the two drugs (intention to treat analyses).The restriction leaves us with a total of 4,601 patients in the study population.The reason for this choice is that the first treatment can be considered as 'randomized' in the design, while the second cannot.However, we also present results from a sensitivity analysis in which we have excluded the 485 patients who were both prescribed AA and ENZ.
For this population, the year of the cancer diagnosis ranges between 1986 and 2016.Consequently, there is substantial variation in the time to be prescribed AA or ENZ from the date of cancer diagnosis.This so-called waiting time is most likely an important covariate.
As seen from Table 1, the prescription of the two drugs varies over the 21 county councils, hereafter denoted counties, the responsible body for healthcare in Sweden.The fact that the prescription varies substantially over counties is a notable finding as it suggests differences in the prescription that may not be related only to patients' health status.From this table, we can see that in total, 24 percent of the patients were prescribed AA, but the proportion ranges from 8 percent in Skåne to 61 percent in Kronoberg.
We have tried to understand the reason for this considerable regional variation by interviewing officials at TLV and doctors.The officials stated that it might be due to the negotiated price agreements between the county councils, the drug company and the Pharmaceutical Benefits Board.Confronted with this proposition, the doctors agreed; however, they stated that it also could be due to differences in hospital recommendations or habits.Unfortunately, the agreements are confidential, which means that we cannot provide evidence on price variation.However, we did se variation in recommendation across the county councils.We construct 23 continuous covariates measuring a patient's general health and health progression before diagnosis and between diagnosis and treatment, including the number of visits at different periods and the number of days in inpatient care.The inclusion of the Elixhouser comorbidity index at diagnosis also captures the general health status.
Further, we include covariates separately for diseases deemed most important for prescription: cardiovascular diseases, metastases, diabetes, fatigue, and osteoporosis (see S1 Table for the included ICD codes).This results in 16 continuous covariates on the number of visits and eight indicators on whether or not a patient has had the specific diagnosis.We also derive three covariates measuring the number of collected prescriptions on medications, three years before the treatment, related to cardiovascular diseases and diabetes.
We create 91 variables intended to describe the socioeconomic status of the patient three years before diagnosis and three years before treatment, respectively, with SCB data from 1991 until 2015.For a few patients with diagnoses before or after these years, information on socioeconomic status is given by values from a year as close to the diagnosis as possible.These variables include information on age, marital status, educational level, pensions, income, sick leave, and other security benefits for the patient and their household.The mean over the three preceding years is used in the few cases of partly missing values on continuous covariates.
Educational level is the highest completed education and is classified as less than, equal to, or more than secondary school.There were 33 observations where education was reported as being unknown.Here a five-nearest neighbour approach is used to impute the missing values.The most common value of the five patients, i.e. neighbours, who are most similar in income, pension, age, and country of birth, is imputed for every missing value of the categorical variable measuring educational level.
One potential problem is that our data do not observe whether patients have received chemotherapy.Our inclusion of the time between diagnosis and treatment as a covariate is intended to control for this fact.In addition, as the quality of healthcare affects health, and may be related to the prescription of the two drugs, we also include the historical county-specific mortality related to prostate cancer at the year of diagnosis.
All 144 covariates with descriptions are presented in S2 Table .In the spring of 2018, 12 urologists and oncologists were asked about their prescription practices.Table 2 provides summary statistics of the AA and ENZ patients for a subset of the 15 variables judged by them to be the most important for the differences in prescription of the two drugs.In addition, the table also includes county-specific mortality rates and years to treatment.
From this table, we can see that the two groups are very similar.There are no significant differences in average age, educational level, or marital status.The main differences between the groups is that the ENZ patients have: (i) a higher prevalence of acute myocardial infarction, (ii) a higher prevalence of diabetes prescriptions, (iii) a lower prevalence of metastases, and (iv) a shorter time in years to treatment from diagnosis.

Outcome data
We have one primary and two secondary outcomes to capture different aspects of morbidity.The primary outcome is all-cause mortality; the two secondary outcomes are skeleton-related events (SRE) and severe pain.
All-cause mortality is defined using an indicator variable 'DEAD', taking value one for dead patients and zero for patients who are alive at the end of each 30-day period after beginning AA or ENZ treatment.Patients are assumed to suffer an SRE if they experience hospitalization because of pathologic fracture (ATC codes M485, M495, M844, and M907) or spinal cord compression (G550, G834, G952, G958, G959, and G992) [28].The SRE indicator is valued as one for periods with such hospitalizations and zero for other periods.Patients are assumed to suffer severe pain if they receive prescriptions for neuropathic pain, i.e. opiates in combination with tramadol and paracetamol (ATC-codes N02AA, N02AX02, and N02BE01).The 'Pain' indicator is valued as one for periods in which the patient has received such a prescription and zero for other periods.With one primary and two secondary outcomes, we will use the Bonferroni corrected standard errors with a five percent overall level.This means that the level on the single outcomes will be 1.67% (= 100 × 0.05/3).
Of the 4,601 patients, 3,658 have a date of death (80 percent).Among those who died, the mean time between prescription and death was about 20 months.Of the 3,658 patients, 3,104 have prostate cancer as one of multiple causes of death, and 2,850 patients have prostate cancer as their main cause of death.Only 502 unique patients have an indication of SRE, and only 13 unique patients have an indication of pain.

Matched sample analysis
The matched sample was created using the generalized Mahalanobis distance metric, denoting genetic matching [29], with the objective of estimating an average treatment effect.As the two groups were not completely balanced, a one-to-one matching with replacement from a genetic matching algorithm was used.
Observations with a distance of more than three standard deviations for any of the included covariates were excluded.This led to 85 dropped observations and 4,516 observations in the

The regression estimator
Taking stock of the Neyman-Rubin potential outcomes framework [30][31][32], we define the potential outcome if a patient had been given AA as Y (1), and as Y (0) if he had been given ENZ.Our interest is that of estimating the conditional average treatment effect for the population of n individuals in our sample, formally defined as Let x i be the observed covariates and let W i = 1 if a patient was prescribed AA and W i = 0 if prescribed ENZ.Under the Stable Unit Treatment Value Assumption (SUTVA,33), the observed outcome, Y i , is equal to the potential outcome, thus The analysis sample is formed by finding the closest ENZ patient to an AA patient, and vice versa, regarding their covariates.In this procedure, we are matching with replacement.Formally, the unobserved outcomes Y i (0) i = 1,. ..,n 1 for the AA patients with covariates x i , are imputed as follows where ||�|| is the generalized Mahalanobis distance used in the genetic matching algorithm.
The unobserved outcomes Y j (1), j = 1,. ..,n 0 , for the ENZ patients, with covariates x j , are then imputed as The one-to-one matching estimator of CATE is then defined as As seen from Eq (3), in the estimation we use twice as many observations as number of patients in the data.This means that observations are correlated and that this correlation needs to be considered in the inference.This is easily managed using a regression estimator in which the standard errors are to be estimated by clustering on individuals.An advantage of the regression estimator is that we can adjust for the bias due to inexact matching by adding covariates.Fig 1 shows the covariates used in the analysis.
We use the OLS estimator, and estimate the following regression model: where � x is the sample mean of the covariates, n = n 0 +n 1 and Ỹ h defines either the observed outcome or the imputed outcome.The OLS estimate, t, is the estimate of the CATE.The standard errors are estimated by clustering at the individual level.
To obtain a summary measurement for the mortality outcome and to handle censoring, we also estimated a discrete time Cox regression model.That is, we let PrðY it ¼ 1jx i ; W i ; Y itÀ 1 ¼ 0Þ ¼ l it ðyÞ; where where T is the maximum number of months and y ¼ ðg 1 ; . . .; g TÀ 1 ; b; tÞ.Note that in this analysis we are using the 4,516 observations in the matched sample only.
Based on the maximum likelihood (ML) estimates (i.e.ŷ), we then estimate the survival function up to any month t for the AA and ENZ patients, respectively.The estimated survival function up to t is then Here R 1t and R 0t are the risk set of the n 1t and n 0t patients that have not yet died in period t, respectively.The overall effect on survival up to a given period � T is then estimated as Sð1; tj ŷÞ À Sð0; tj ŷÞ ð6Þ

Instrumental variable analysis
Almost always, researchers using IV estimators estimate the first-stage regression at the same time as conducting the analysis.Here, the first stage and the test for the relevance of the county instruments were made before observing our outcomes.This means that the following IV analysis is a design-based approach [cf.33].
The framework for IV analysis is based on the model of potential latent variables.Under modelling assumptions, this allows deriving the IV estimator as a maximum likelihood (ML) estimator.This section describes this framework and the resulting ML estimator.Details regarding covariate definitions are provided in Johansson et al. [9].
The potential problem in identifying an average treatment effect in the matched data is that doctors could prescribe AA or ENZ based on health, which we cannot observe.To handle this potential problem, specify we specify a latent propensity for prescriptions.Let q ic = 1 if individual i is living in county c and let x i be the covariates displayed in Fig 1 then this latent propensity for prescriptions is where q i ¼ ðq i1;...; q i21 Þ 0 and γ c ¼ ðg 1;...; where If W * i > 0, the patient is prescribed AA (i.e.W i = 1), and ENZ otherwise.Under the further assumption that ε i is normally distributed, the probability of being prescribed AA is where F(.) is the cumulative distribution function of the standard normal.This first stage regression was presented in Johansson et al. [9].We tested for the relevance of q i (i.e. that q i affects prescription given the covariates), and assessed the exclusion restriction (i.e.no effect of q i on the outcome, except through W i ) by estimating the effect on Pain or SRE at the time of diagnosis as proxy outcomes.For completeness, we present the estimates from the probit estimation of ( 8), the test of relevance, and the assessment of the exclusion restriction in S2 Text.
Let the unobserved health of individual i at month t>0 if given ENZ or AA be and respectively.� x is the mean vector of the covariates in the sample and u i0t and u i1t are error terms.
The unobserved health given the prescribed drug can then be formulated as a function of the error terms and the observed covariates: With this specification, d 0 1t ¼ b 1t À b 0t is the average treatment effect at months t on the latent outcome and vector d 0 Dt are heterogeneous effects, centred around d 0 1t , with respect to the covariates.
The potential problem in the matched design is now solved in this model by letting u i0t = ρ 0t ε i +η i0t and u i1t = ρ 1t ε i +η i1t , where η 0t and η 1t are both random.This means that the unobserved health is given as Note that with this specification, we assume that there is no unobserved heterogeneity in the effects correlated with the county factor IV.This means that under model assumptions, we identify the average treatment effect instead of the complier treatment effect.
We observe Under the assumptions that ε i is independent of z i (i.e. the exclusion restriction) and that ε i , η i1t and η i0t are standard normal we get this means where ε i in the integral is a dummy argument of integration.Similarly where θ 1t ¼ ðβ 0t ; δ 0 t ; δ 1t ; δ 0 Dt ; γ 0 ; r 1t Þ0 and θ 0t ¼ ðβ 0t ; δ 0 t ; γ 0 ; r 0t Þ0, respectively.The likelihood to be maximized with respect to θ t ¼ ðb 0t ; δ 0 t ; d 1t ; δ 0 Dt ; γ 0 ; r 1t ; r 0t Þ0 is then given as For each of the periods t = 1,. ..T, we estimate the conditional individual treatment effect where ^denotes the maximum likelihood estimates.The conditional average treatment effect at each month t is estimated as This framework is easily extended to a discrete-time survival analysis model for the mortality outcome.Now we let 0Þ be the probability that patient i prescribed AA dies at month t given survival up to this month and let 0Þ be the corresponding conditional probability if the patient instead was prescribed ENZ.Furthermore, we let Thus, θ 1t ¼ ðb t ; δ 0 ; d 1 ; δ 0 D ; γ 0 ; r 1t Þ0 and θ 0t ¼ ðb t ; δ 0 ; γ 0 ; r 0t Þ0.The term β t +δ 0 x i measures the baseline conditional probability of dying, given survival up to month t, while δ 1 and δ 0 D ðx i À � xÞ measure the 'shift' in this baseline probability, i.e. an effect.We restrict the effect on the conditional probability to be the same at all months.
Let β ¼ ðb 1 ; . . .; b TÀ 1 Þ0, where T is the last follow-up month and let T i be the number of the months the individual is alive, thus T i = T, if the individual is alive when we end the study.The likelihood to be maximized with respect to θ t ¼ ðβ; Based on the ML estimates, we then estimate the survival function up to any month t for the AA and ENZ patients (cf.Eq 5).The overall effect on survival up to a given period T�is estimated using Eq 6.
For the estimation of these models, we use the algorithm described in Huntington-Klein [34].Confidence intervals of the estimand of interest are estimated using the bootstrap percentile method.

Mortality
The results for the matched sample and IV analysis are presented in from the matched sample regression.These results suggest a higher mortality rate from being prescribed AA than ENZ.The point estimates for the IV estimator are very similar.However, the confidence intervals are substantially wider (around four times as wide), explaining why no estimate is statistically significant.
The effects on survival time up to 48 months are, for both analyses, presented in Fig 3 .Both show the same pattern: a clear reduction in survival rates for patients prescribed AA in contrast to ENZ.The overall effect on survival up to 48 months is estimated to Dm 48 ¼ À 0:38 and DIV 48 ¼ À 025 in the matched sample and IV analysis, respectively.All of the estimates are statistically significant for the matched sample regression.As the length of the confidence intervals is around twice as long for the IV estimator as for the matched sample regression, the IV estimates are not statistically significant for short and long survival times.

Sensitivity-analysis.
As discussed previously, we estimate the comparative effectiveness of participants treated as intended (intention-to-treat approach).Under the assumption that switching prescriptions among the NHTs is random, conditional on our observed covariates, we can also estimate the comparative effectiveness for participants following the protocol by excluding those who switch treatments from the analysis.This per-protocol analysis excludes 200 patients who were first prescribed AA and later prescribed ENZ, and 285 who made the reverse switch, i.e., among those 1,110 and 3,406 patients who were first prescribed AA and ENZ, respectively.This means that we exclude 18 percent and 8 percent of those first prescribed AA and ENZ.
The per-protocol analysis shows, again, a higher mortality rate from being prescribed AA than ENZ.The mortality differences in the matched analysis are more pronounced than in the intent-to-treat analysis, and the number of statistically significant effects is now 43 out of the 48 estimates (see S1 Fig) .For the IV analysis the results are almost identical to the previous results and are therefore not included in the Supporting information.Consequently, under a stronger assumption, we can conclude that mortality is higher on AA than ENZ if the patients follow the protocol.The results for patients with a long (panel a) and short (panel b) waiting time to treatment are presented in Fig 6 .Patients with a long waiting time for treatment display a clear reduction in mortality if given ENZ instead of AA, while no difference in effect is seen for patients with a short waiting time.
Results from the IV analysis are similar; however, there are substantially larger confidence intervals, which is why we do not present the results from these sub-analyses.

Skeleton-related events and severe pain
The SRE analysis results are presented in Fig 7 .The Fig displays the point estimate and the 95% Bonferroni corrected confidence interval over 24 months.We find five statistically significant effects in the matched regression (months 11, 13, 14, 23 and 28).
All these effects are negative, suggesting fewer SRE from being prescribed AA than ENZ.The results from the IV analysis vary substantially over the months.However, the results should be carefully interpreted due to the small number of SRE patients.
A potential problem with this analysis is that an SRE is only observed in the data if the patient is alive.In the analysis for the matched regression sample, we excluded dead patients up to the month of the analysis.As a sensitivity analysis, we estimate the bounds of potential effects.We let all patients who die have either no morbidity outcome or a morbidity outcome (i.e.SRE = 0 or SRE = 1, respectively).If the mortality with AA is higher than with ENZ, the first case with SRE = 0 provides a lower bound estimate of the effectiveness of the NHT while the second provides an upper bound estimate on the SRE.
Panel (a) of Fig 8 presents the results from the lower bound analysis, while panel (b) presents the result from the upper bound analysis.As very few of the lower bound estimates are statistically significant and negative, while all estimates are positive and many times statistically significant, these results indicate an increase in SRE if prescribed AA rather than ENZ.
The three sub-analyses (respondents, aggressiveness, and time of prescription) on the SRE were also detailed in Johansson et al. [8].The point estimates are, as in the main analysis, negative for all groups.There is some weak evidence of a more negative effect on high respondent patients (see As we only have 13 patients with severe pain, according to our definition of Pain, we get, as expected, no statistically significant results.However, for completeness, the results for the matched sample regression estimator are presented in S2 Fig.In the pre-analysis plan, it was stated that sub-analyses should also be made on Pain.However, as only 13 patients suffer from pain, according to our definition, sub-analyses of this outcome are not meaningful.

Assessing the assumptions and sensitivity analyses
At the same time as adding mortality data, data from NPCR was added to the analysis sample.These data contain more detailed information on patients' health concerning prostate cancer and allows us to assess the assumptions for the matched sample and IV analysis by estimating placebo effects on three covariates, judged by specialists as important confounders.These are PSA levels (SPSA), Gleason score (GleasSa), and metastases (Mstad).All covariates were measured at the time of the prostate cancer diagnosis.With three pre-measured covariates, as in the main analysis, we adjust the significance level for the individual tests using Bonferroni correction based on a five percent overall level.
The idea is that if there are statistically significant effects on these covariates when using the same regression analysis used in the main analysis, this suggests that available data from the population register are insufficient to control for confounding bias.
The NPCR does not have full coverage, and there are partially missing data on the covariates.For example, there are 170, 437 and 1,176 missing observations for SPSA, GleasSa and Mstad, respectively.We treat these missing data as random and remove them from the analysis.The reason is that conditional on the covariates, we found no association between missing data and treatment indicator (see S4 Text).
The results from the analysis for the matched sample regression and the IV analysis are displayed in Table 3.None of the estimated effects are statistically significant at the 1.67% level or, for that matter, the 5% level.Thus, these sensitivity analyses provide quite strong evidence of a causal effect.One drawback with the assessment is that these variables are measured at the time of diagnosis rather than at the time of prescription.Unfortunately, measurement at the time of prescription is not available in the NPCR data.However, these tests are for uncounfoundness in general, so indirectly, also for PSA at the time of prescription.

5.
3.1 Rosenbaum's sensitivity analysis for hidden bias.Rosenbaum's sensitivity analysis for hidden bias, for a matched binary outcome, is conducted for the 26 significant effects from the matched sample regression [35].
From S3 Fig, we can e.g.see that for time period 4, the upper bound for the sensitivity analysis is Γ = 1.22.This indicates that the confidence interval for the effect would include zero if an unobserved variable caused the odds ratio of treatment assignment to differ between the treatment and comparison groups, by 1.22.On average, over all 26 significant effects the upper bound Γ is 1.14.
Results from this sensitivity analysis suggest that we should interpret the results cautiously.However, taking all things together, the results from the placebo regressions, the similar results with the IV estimator, the potential bias for the matched sample regressions must be seen as small.

Exploratory analysis
Johansson et al. [8] describe exploratory analyses on prostate-specific mortality and compliance with the treatment.The results from the analysis are in agreement with the results from overall mortality, but with wider confidence intervals.The results of the analysis on compliance are not statistically significant at any reasonable level of risk.For these two reasons, we defer the discussion and the presentation of the results to S5 Text.

Discussion
As a potential input to the discussion on the use of pre-published protocols in analysis of Real World Data (RWD), this paper has illustrated that an observational study based on a pre-published protocol can entail the same level of detail as a protocol for an RCT.To this end, we present the results from a comparative effectiveness evaluation of abiraterone acetate (AA) against enzalutamide (ENZ) in clinical practice, two cancer drugs prescribed to patients with advanced prostate cancer.
Based on two complementary models, we have estimated effects on all-cause mortality and two morbidity outcomes (skeleton-related events and severe pain).The designs and pre-specified analyses are described in the two pre-analysis plans [8,9].
The results from the two analyses (matched sampling analysis and IV analysis) both show an increased mortality risk from prescribing AA compared to ENZ.These results support the findings in Tagawa et al. [25], and Schoen et al. [26].In addition, the matched sampling analysis also suggests an increased risk of skeleton-related events.
Inference from observational studies may suffer from many forms of bias.One concerns the potential of researchers being subjective.In a model-based analysis of the outcomes, the researcher may adjust the model due to surprising results and, generally, bias the results from the analysis.This form of subjectivity bias is less of a problem with design-based studies [cf.36] and does not exist at all as long as all relevant information can be included in a pre-analysis plan.This is the same as for an RCT.
The fact that the analysis is objective, however, does not mean that the inference is valid.Inferences from all observational analyses may be biased.One must always recognize the possibility that unobserved confounders are not balanced or, in the case of the IV design, reflect on unsubstantiated model assumptions.
The validity of the inference is assessed based primarily on the auxiliary data from the National Prostate Cancer Register.The identifying assumptions in both designs in this paper could not be rejected.In addition, as the results from the two analyses are qualitatively very similar, the findings should be of interest to the health-care profession.

Fig 1 .
Fig 1. Balance of the main covariates in the design.Propensity score is estimated using a logit model, and using LASSO regression on all possible covariates.The factors are derived from an exploratory factor analysis on 130 continuous covariates.Factor 1: Welfare and social security benefits; Factor 2: Wages and disposable income; Factor 3: Occupational pensions; Factor 4: Early retirement benefits and welfare; Factor 5: Sickness and unemployment benefits; Factor 6: Private pensions; Factor 7: Health before treatment; Factor 8: Income at diagnosis; and Factor 9: Income from business.https://doi.org/10.1371/journal.pone.0293000.g001

Fig 2 .
The Fig displays the point estimate, and the 95% Bonferroni corrected confidence interval on 48 periods of 30 days, in the following denoted months.From the Fig, we can see 26 statistically significant effects, all

5 . 1 . 2
Sub-analyses.As detailed in Johansson et al.[8], three sub-analyses (responders, aggressiveness, and waiting time) on mortality were suggested for the matched sample analysis.Patients previously given hormone treatment for 12 months or more are defined as being high responders.Patients with visceral metastases in conjunction with hospitalization (ICD-10

Fig 6 .Fig 7 .Fig 8 .
Fig 6.CATE mortality estimates (•) and 95% Bonferroni corrected confidence intervals.Long and short waiting times.(a) Long waiting time, i.e. patients having a longer period of time between diagnosis of prostate cancer and prescription for AA or ENZ treatment, compared to the median waiting time of 5.9 years for both drugs.(b) Short waiting time, i.e. patients having a shorter period of time between diagnosis of prostate cancer and prescription for AA or ENZ treatment, compared to the median waiting time of 5.9 years for both drugs.https://doi.org/10.1371/journal.pone.0293000.g006 Fig A in S3 Text) and patients with short waiting times (see Fig B in S3 Text).

Table 3 . Results from the placebo regressions.
Note: OLS cluster robust standard errors and IV bootstrap percentile method.*Standardized to have mean zero and unit variance.https://doi.org/10.1371/journal.pone.0293000.t003